Categories

Versions

Filter Tokens (by POS Ratios) (Text Processing)

Synopsis

Filters tokens based on criteria on POS ratios.

Description

This operator keeps only tokens which fulfill specified criteria about Part of Speech (POS) ratios. The operator calculates the amounts of verbs, nouns etc. and keep only tokens which provide a specified amount of those types.

Input

  • document

    The document port.

Output

  • document

    The document port.

Parameters

  • language_sourceSpecifies whether the language is set explicitely by the user or specified as a meta data attribute in the document. Range:
  • languageThe language for the used part of speech (POS) tagger. Range:
  • language_attributeThe meta data attribute key that contains the iso language code of the document. Range:
  • min_ratio_adjectivesThe minimum ratio of adjectives for each token to be kept Range:
  • min_ratio_nounsThe minimum ratio of nouns for each token to be kept Range:
  • min_ratio_verbsThe minimum ratio of verbs for each token to be kept Range: